WWW MarkUp | SGML

Document Management for Web Specs

Aaargh! Maintaining specs is a Royal Pain! We need to automate this!

See also:

Requirements

Goals

Wishes

Possible Solutions

FrameMaker, WebMaker, ??? print-to-text tool
This is what Roy Fielding (and a lot of other folks) use.
HTML+, dsr's tools
Dave Raggett edits the HTML with a text editor (mostly BBEdit on a Mac). He's got some little tools written in C to produce plain text.
Snafu DTD, gf tools, Texi2HTML, COST, Joe English
This is what I ended up using for HTML 2.0
LinuxDoc
LaTeX, latex2html, IETF print-to-text tools
MS Word, rtf2html, ??? print-to-text tool

Ideal Solution

Source format: HTML dialect
use a strict HTML dialect with: tables, class=abstract, possibly math.
Document Manipulation API: python/ILU
There are lots of web libraries for python. We could eventually specify the interfaces in ILU and use them from lots of languages (C, C++, java, scheme, CommonLisp, Modula-3), but we'd prototype and develop using python.

I've already written little tools to do things like relativize links and such. Rather than doing TOC generation, section nubmbering, etc. during translation, we'd do it in-place in the source, but automatically

Chunking support: python scripts
This would handle chunking many HTML documents into one for printing, and many-to-many chunking for author/reader convenience.
PostScript Output: python implementation of Mosaic print tool
This code is already written. Guido translated the postscript printing code from Mosaic into python. We could adapt things like headers/footers for our needs. This eliminates the need for a TeX installation.
Postscript Output: libwww TeX module?
use HTTeXGen module in libwww to generate TeX. It doesn't currently support all the features we need, but it could work. It would rely on a many-to-one html-to-html filter
Postscript Output: html2lout?
lout is kinda like TeX, but it was written since the dawn of postscript, so there's less redundancy between lout and PS than between TeX and PS. The syntax of lout is also cleaner. Lout has table, equasion, etc. packages. A clean html2lout filter should be much more reliable and hands-free than anything based on TeX.
Plain-Text output: custom python app?
there is already python code to do simple html to text formatting, but handling multiple documents, tables etc. needs to be added, as well as IETF style
Plain-Text output: libwww module?
same feature enhancements would be needed.

References

Postscript
Python
SGML
LinuxDocSGML
lout
Joe English
Eric Raymond
Linux, computational linguistics, www-html <199512221800.NAA09004@locke.ccil.org>


Connolly
last update by $Author: connolly $ on $Date: 1996/05/03 22:14:14 $